Week 3 Computer Lab Worksheet
Interactions and causal inference
Aims
In this session we continue working with data from Österman (2021) to replicate parts their analysis. In Week 2 we practised building very basic bivariate and multiple linear regression models. In this session, we will expand on those models and attempt to replicate (some of) the models reported in Table 3 of Österman (2021):
Setup
In Week 1 you set up R and RStudio, and an RProject folder (we called it “HSS8005_labs”) with an .R script and a .qmd or .Rmd document in it (we called these “Lab_1”). Ideally, you saved this on a cloud drive so you can access it from any computer (e.g. OneDrive). You will be working in this folder. If it’s missing, complete Task 2 from the Week 1 Lab.
You can create a new .R script and .qmd/.Rmd for this week’s work (e.g. “Lab_3”). It may be useful to document more of your coding process and interpretations, so you want to work in a .qmd/.Rmd document rather than an .R script.
Import data
As a first step, let’s import the osterman dataset that underpins the Österman (2021) article (see the Data page on the course website for information about the datasets available in this course):
Basic interactions
To continue on from the toy models that we fitted in Week 2, we may remember that we observed a change in the size of the effect and statistical significance of the gender variable (female) when also accounting for years of education and age. The two models compared as so:
m_m <- lm(trustindex3 ~ eduyrs25 + agea + female, data = osterman)
m_b <- lm(trustindex3 ~ female, data = m_m$model)
sjPlot::tab_model(m_b, m_m)| trustindex 3 | trustindex 3 | |||||
|---|---|---|---|---|---|---|
| Predictors | Estimates | CI | p | Estimates | CI | p |
| (Intercept) | 5.24 | 5.22 – 5.26 | <0.001 | 2.95 | 2.87 – 3.04 | <0.001 |
| female | 0.01 | -0.02 – 0.04 | 0.582 | 0.04 | 0.01 – 0.07 | 0.004 |
| eduyrs 25 | 0.12 | 0.11 – 0.12 | <0.001 | |||
| Age of respondent,calculated |
0.02 | 0.01 – 0.02 | <0.001 | |||
| Observations | 68211 | 68211 | ||||
| R2 / R2 adjusted | 0.000 / -0.000 | 0.063 / 0.063 | ||||
The question arises whether the gender variable interacts with the other predictors, particularly with education as that is the main focus of the research. In other words, we may have reasons to assume that the linear effect of education may be different for males and females. We can interact predictor variables in regression models using the : and * operators. With :, we include in the regression model only the interaction term, but not the component variables (i.e. their main effects); to include the component variables as well, we need to add them in as usual with the + operator. To note that we should normally want to include both main effects and and the interaction effects in a regression model. With *, we include both the main effects and the interaction effects, so it provides a shortcut to using :, but it is often useful to combine the two in more complex interaction scenarios.
In the code below, we interact gender with both education and age, and the two commands have the same result:
mxa <- lm(trustindex3 ~ eduyrs25 + agea +female + female*eduyrs25 + female*agea, data = osterman)
mxb <- lm(trustindex3 ~ female*(eduyrs25 + agea), data = osterman)
## Check that they are equivalent
sjPlot::tab_model(mxa, mxb)| trustindex 3 | trustindex 3 | |||||
|---|---|---|---|---|---|---|
| Predictors | Estimates | CI | p | Estimates | CI | p |
| (Intercept) | 3.25 | 3.13 – 3.36 | <0.001 | 3.25 | 3.13 – 3.36 | <0.001 |
| eduyrs 25 | 0.10 | 0.10 – 0.11 | <0.001 | 0.10 | 0.10 – 0.11 | <0.001 |
| Age of respondent,calculated |
0.01 | 0.01 – 0.02 | <0.001 | 0.01 | 0.01 – 0.02 | <0.001 |
| female | -0.53 | -0.69 – -0.36 | <0.001 | -0.53 | -0.69 – -0.36 | <0.001 |
| eduyrs25:female | 0.03 | 0.02 – 0.03 | <0.001 | |||
| agea:female | 0.00 | 0.00 – 0.01 | <0.001 | |||
| female:eduyrs25 | 0.03 | 0.02 – 0.03 | <0.001 | |||
| female:agea | 0.00 | 0.00 – 0.01 | <0.001 | |||
| Observations | 68211 | 68211 | ||||
| R2 / R2 adjusted | 0.064 / 0.064 | 0.064 / 0.064 | ||||
We find that both interaction terms (eduyrs25:female and agea:female) are statistically significant, which can tell us that there is an interaction that may be useful to include in the model. The greatest challenge is in interpreting results from interaction models, as the usual interpretation of the main effect terms no longer stands and the focus should be on the interaction terms instead. The best approach is to visualise the interactions using a graph, and there are various user-written functions that make this task easy. Just to keep it simple, we will use the plot_model() function from the {sjPlot} package.
First, we can check the interaction between education and gender:
This visualization in now easier to interpret than the numerical output. We see a steeper line for females than for males and the lines cross each other at around 12 years of completed schooling. Years spent in education therefore have a stronger influence on social trust for females than for males after around 12 years (which is generally the length of compulsory education in most European countries from which the data is drawn). Among those with education below this level, we find that males have higher levels of social trust, but longer educational careers have a stronger impact for women.
We can check the same for the interaction of gender and age:
The general pattern is the same, however, the overlapping confidence intervals up until about the age of 53 tell us that the observed difference between the two genders may be due to random variation in our sample. The effect of this interaction is weaker and less reliable than the one with education, and unless we have a strong theoretically motivated reason to include it in our model, we may want to keep a simpler model without this interaction effect.
Replicating Model 1
With this basic understanding of interaction terms, we can now attempt to replicate Model 1 from the Osterman article. Based on the description provided in the article and the appended materials we can reconstruct the main components of their model as shown below. As a first step, run the command and compare our results to those presented in Model 1. Bear in mind that the effects most coefficients are not presented in the main table (see also the table’s footnotes and Appendix 4 in the Supplementary Materials). We also did not include any survey weights in our model and did not include any error corrections, which the author does; so the results will not be exactly the same.
m1 <- lm(trustindex3 ~ reform1_7 + female + blgetmg_d + fbrneur + mbrneur + fnotbrneur + mnotbrneur + agea + agea + factor(essround) + yrbrn + yrbrn:yrbrn + yrbrn:factor(reform_id_num) + agea:factor(reform_id_num) + agea:agea + agea:agea:factor(reform_id_num) + factor(reform_id_num), data = osterman)
sjPlot::tab_model(m1)| trustindex 3 | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 55.15 | -27.45 – 137.76 | 0.191 |
| reform 1 7 | 0.06 | 0.01 – 0.12 | 0.020 |
| female | 0.06 | 0.03 – 0.08 | <0.001 |
| blgetmg d | -0.24 | -0.35 – -0.13 | <0.001 |
| Foreign-born father,born in Europe |
-0.11 | -0.19 – -0.03 | 0.009 |
| Foreign-born mother,born in Europe |
-0.11 | -0.19 – -0.02 | 0.015 |
| fnotbrneur | -0.06 | -0.21 – 0.08 | 0.382 |
| mnotbrneur | -0.11 | -0.26 – 0.04 | 0.152 |
| Age of respondent,calculated |
-0.02 | -0.06 – 0.02 | 0.354 |
| factor(essround)2 | 0.06 | -0.03 – 0.15 | 0.175 |
| factor(essround)3 | 0.16 | 0.02 – 0.31 | 0.029 |
| factor(essround)4 | 0.25 | 0.04 – 0.46 | 0.020 |
| factor(essround)5 | 0.37 | 0.09 – 0.65 | 0.009 |
| factor(essround)6 | 0.42 | 0.07 – 0.77 | 0.019 |
| factor(essround)7 | 0.47 | 0.06 – 0.89 | 0.025 |
| factor(essround)8 | 0.69 | 0.21 – 1.17 | 0.005 |
| factor(essround)9 | 0.84 | 0.29 – 1.40 | 0.003 |
| Year of birth | -0.03 | -0.07 – 0.02 | 0.233 |
| factor(reform_id_num)2 | 25.71 | -31.48 – 82.90 | 0.378 |
| factor(reform_id_num)3 | 45.08 | -13.98 – 104.15 | 0.135 |
| factor(reform_id_num)4 | -16.50 | -67.09 – 34.09 | 0.523 |
| factor(reform_id_num)5 | 50.81 | -16.81 – 118.43 | 0.141 |
| factor(reform_id_num)6 | -10.79 | -78.90 – 57.33 | 0.756 |
| factor(reform_id_num)7 | 40.38 | -13.19 – 93.96 | 0.140 |
| factor(reform_id_num)8 | 59.01 | 5.85 – 112.17 | 0.030 |
| factor(reform_id_num)9 | 30.98 | -29.20 – 91.16 | 0.313 |
| factor(reform_id_num)10 | 35.89 | -16.04 – 87.82 | 0.176 |
| factor(reform_id_num)11 | 26.89 | -24.05 – 77.82 | 0.301 |
| factor(reform_id_num)12 | 24.03 | -33.65 – 81.71 | 0.414 |
| factor(reform_id_num)13 | 41.87 | -10.16 – 93.91 | 0.115 |
| factor(reform_id_num)14 | -72.42 | -140.69 – -4.15 | 0.038 |
| factor(reform_id_num)15 | -4.37 | -55.16 – 46.41 | 0.866 |
| factor(reform_id_num)16 | 105.35 | 56.54 – 154.16 | <0.001 |
| factor(reform_id_num)17 | 15.07 | -44.38 – 74.51 | 0.619 |
| factor(reform_id_num)18 | 20.58 | -41.72 – 82.87 | 0.517 |
| factor(reform_id_num)19 | 18.69 | -57.64 – 95.02 | 0.631 |
| factor(reform_id_num)20 | 106.91 | -30.96 – 244.77 | 0.129 |
| factor(reform_id_num)21 | 0.48 | -48.64 – 49.61 | 0.985 |
| factor(reform_id_num)22 | 2.49 | -47.48 – 52.47 | 0.922 |
| factor(reform_id_num)23 | 5.58 | -56.32 – 67.48 | 0.860 |
| factor(reform_id_num)24 | 52.34 | -14.49 – 119.17 | 0.125 |
| factor(reform_id_num)25 | 64.69 | -3.71 – 133.09 | 0.064 |
| factor(reform_id_num)26 | 48.18 | -12.09 – 108.44 | 0.117 |
| factor(reform_id_num)27 | -3.59 | -56.72 – 49.54 | 0.895 |
| yrbrn:factor(reform_id_num)2 | -0.01 | -0.04 – 0.02 | 0.393 |
| yrbrn:factor(reform_id_num)3 | -0.02 | -0.05 – 0.01 | 0.135 |
| yrbrn:factor(reform_id_num)4 | 0.01 | -0.02 – 0.03 | 0.532 |
| yrbrn:factor(reform_id_num)5 | -0.02 | -0.06 – 0.01 | 0.161 |
| yrbrn:factor(reform_id_num)6 | 0.01 | -0.03 – 0.04 | 0.699 |
| yrbrn:factor(reform_id_num)7 | -0.02 | -0.05 – 0.01 | 0.144 |
| yrbrn:factor(reform_id_num)8 | -0.03 | -0.06 – -0.00 | 0.032 |
| yrbrn:factor(reform_id_num)9 | -0.01 | -0.05 – 0.02 | 0.336 |
| yrbrn:factor(reform_id_num)10 | -0.02 | -0.04 – 0.01 | 0.181 |
| yrbrn:factor(reform_id_num)11 | -0.01 | -0.04 – 0.01 | 0.293 |
| yrbrn:factor(reform_id_num)12 | -0.01 | -0.04 – 0.02 | 0.390 |
| yrbrn:factor(reform_id_num)13 | -0.02 | -0.05 – 0.01 | 0.116 |
| yrbrn:factor(reform_id_num)14 | 0.04 | 0.00 – 0.07 | 0.040 |
| yrbrn:factor(reform_id_num)15 | 0.00 | -0.02 – 0.03 | 0.888 |
| yrbrn:factor(reform_id_num)16 | -0.05 | -0.08 – -0.03 | <0.001 |
| yrbrn:factor(reform_id_num)17 | -0.01 | -0.04 – 0.02 | 0.621 |
| yrbrn:factor(reform_id_num)18 | -0.01 | -0.04 – 0.02 | 0.526 |
| yrbrn:factor(reform_id_num)19 | -0.01 | -0.05 – 0.03 | 0.645 |
| yrbrn:factor(reform_id_num)20 | -0.05 | -0.12 – 0.02 | 0.130 |
| yrbrn:factor(reform_id_num)21 | 0.00 | -0.02 – 0.02 | 0.993 |
| yrbrn:factor(reform_id_num)22 | -0.00 | -0.03 – 0.02 | 0.869 |
| yrbrn:factor(reform_id_num)23 | -0.00 | -0.03 – 0.03 | 0.870 |
| yrbrn:factor(reform_id_num)24 | -0.03 | -0.06 – 0.01 | 0.118 |
| yrbrn:factor(reform_id_num)25 | -0.03 | -0.07 – 0.00 | 0.062 |
| yrbrn:factor(reform_id_num)26 | -0.02 | -0.05 – 0.01 | 0.117 |
| yrbrn:factor(reform_id_num)27 | 0.00 | -0.02 – 0.03 | 0.853 |
| agea:factor(reform_id_num)2 | -0.02 | -0.04 – -0.00 | 0.019 |
| agea:factor(reform_id_num)3 | -0.01 | -0.03 – 0.01 | 0.310 |
| agea:factor(reform_id_num)4 | 0.01 | -0.01 – 0.03 | 0.240 |
| agea:factor(reform_id_num)5 | -0.02 | -0.05 – -0.00 | 0.028 |
| agea:factor(reform_id_num)6 | -0.02 | -0.04 – 0.01 | 0.174 |
| agea:factor(reform_id_num)7 | -0.02 | -0.04 – -0.00 | 0.018 |
| agea:factor(reform_id_num)8 | -0.03 | -0.05 – -0.02 | <0.001 |
| agea:factor(reform_id_num)9 | -0.01 | -0.03 – 0.01 | 0.205 |
| agea:factor(reform_id_num)10 | -0.02 | -0.04 – -0.01 | 0.006 |
| agea:factor(reform_id_num)11 | -0.00 | -0.02 – 0.01 | 0.569 |
| agea:factor(reform_id_num)12 | 0.02 | -0.01 – 0.04 | 0.163 |
| agea:factor(reform_id_num)13 | -0.01 | -0.02 – 0.01 | 0.509 |
| agea:factor(reform_id_num)14 | 0.01 | -0.02 – 0.03 | 0.673 |
| agea:factor(reform_id_num)15 | -0.00 | -0.02 – 0.01 | 0.561 |
| agea:factor(reform_id_num)16 | -0.04 | -0.06 – -0.03 | <0.001 |
| agea:factor(reform_id_num)17 | 0.01 | -0.01 – 0.03 | 0.349 |
| agea:factor(reform_id_num)18 | -0.03 | -0.05 – -0.01 | 0.004 |
| agea:factor(reform_id_num)19 | -0.00 | -0.02 – 0.02 | 0.894 |
| agea:factor(reform_id_num)20 | -0.02 | -0.09 – 0.05 | 0.592 |
| agea:factor(reform_id_num)21 | 0.00 | -0.02 – 0.02 | 0.994 |
| agea:factor(reform_id_num)22 | 0.00 | -0.01 – 0.02 | 0.537 |
| agea:factor(reform_id_num)23 | -0.04 | -0.07 – -0.02 | 0.001 |
| agea:factor(reform_id_num)24 | -0.02 | -0.04 – 0.01 | 0.145 |
| agea:factor(reform_id_num)25 | -0.02 | -0.04 – -0.00 | 0.026 |
| agea:factor(reform_id_num)26 | -0.03 | -0.05 – -0.00 | 0.018 |
| agea:factor(reform_id_num)27 | -0.01 | -0.02 – 0.01 | 0.571 |
| Observations | 68796 | ||
| R2 / R2 adjusted | 0.199 / 0.198 | ||
Exercise 1
This models provides a lot to play around with. For this exercise, start from this complex model and start simplifying it in various ways. Try excluding some of the interaction terms and check the change in the results. Plot the interaction terms and attempt an interpretation. Note down your thoughts and models. Playing around with this model is a good way to learn about the effect of interactions.
Exercise 2
Read through the Osterman article and check your understanding of how the author interprets these results. Particularly, how does the interaction model help us elucidate causal factors in the regression model?